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ABSTRACT 



The present invention is a comput er- implemented s ystem 
and method th at allow d alaTroma lirst bicrarcriica^aata 
structure to be applied to a second hierarchica l data struc- 
ture, the metti6d comprises" recursively comparing the 
source elements of the first hierarchical structure , to the 
target elements of the second hierarchical structure, and 
applying the data from a source element or source child 
e lement to a matching target element or target child elemen t. 
The method is iterated, until all elements of the second 
hierarchical data structure have been traversed. 
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101 -k_ 

^DATAJTYPB = DATA TYPE NAME + [PARENT DATA TYPE REP ] + [(ELEMENT)*] 



202 - 



208. 



209 



DATA_TYPE_NAME = a naxae that uniquely identifies this datatype from other 
datatypes 



203^1 

PARENT _DATA_TYPB_RBP = a reference to another datatype using* it's 
DATA TYPE NAME . This value Indicates that this datatype is a 
"descendent" of PARENT data type REF 

204 -J 

Element « element name + [data_typejrbf] + (positional repbrence) + 
[alias name ) + / (element ) * j 

205 vl 

^ELEMBNT^NAMB = a name that identifies this element 

206 DATA_TYPB_JUy = a reference to another datatype ixeing it's 
P CDATA IYPS /fflMB . This value indicates that chi2d structure of this 

element is at 'least equal to the child structure of the datatype 
referenced . 

207 v 

^POSITIONAL XBTXRSNCX « ELEMENT REF 



ELBMBNT__RS7 = a reference to a child element of the datatype specified by 



TA TYPE REF in this el&nenta parent element. 



ALIAS _NAMB = a reference to a child element of the datatype specified by 
V W)ATA TYPE REF in this elements parent element. When specifying this 
value, it indicates that the element referred to by ALIAS NAME is now 
replaced by ELEMENT NAME 
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< S impleName / > 




<CompoundName infltanceOf =°SimpleKame"> 

<First/> 

<MiddXe/> 

<Last/> 
</CompoundName> 


m 


<ComplexNarae instanceOf c ,l CompoundName n > 
< Prefix insert- "First "/> 
<Suffix/> 

< / CotnplexName > 




< Format tedName inotanceOf * " CompoundNatae" > 

<0ivenl aliaa= "First •/> 

<Given2 all as = "Middle" /> 

<Surname alias = "Last "/> 
</ Forma t tedName > 




cBusine ssName ins tanceOf » n GimpleName " /> 





FIG. 5A 



< Persons 




<Name instanceOf «" CompoundNarne " > 




<Firet/> 




<Middle/> 




<Last/> 


m 


</Name> 




<Addreso instanceOf« 1 *CompoundAddreBo r, > 




<straet/> 




<City/> 




<3tate/> 




<Zip/> 




< /Address > 




<DateOfBith instanceOf =" Date"/ > 




<SSN/> 




</VQT80Tl> 





FIG.5B 
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Schema. A 




< Customer I nfo> 




<K3itiQ ins fc anc&O f o * Compound^ 3inc * > 




<Piret/> 




*M1 

<hxqgi. e/ > 




<LtStst/ > 




< /Name > 




cAddreas instanceOf » w CompoundAddresa> 




<Street/> 




<City/> 




<State/» 




<Zip/> 




<L / I1UUX ODD ' 












<DOB/> 




< Income/ > 




< Credit Info 




<Type/> 




<Numbor/> 








</LrroaLCinzo> 




</ Cub tome r Info > 




Scheme. B 




<Invoice> 




< Purchaser > 




<JTame inatan.ee Of a "CotnplexName" > 




<First/> 




<Hiddle/> 




<Last/ > 




</Name> 




<Addreas instanceOf = ■ CanadianAddress > 




<Stroot/> 




<City/> 




<Province/> 




c Pos t alCode / > 




< /Address > 




-cCreditCard in etanceOf= "Credit Card " > 




<Type/> 




<Nat»e/> 




<Number/> 




<ExpiryDate/> 




. </CredltCard> 




</ Purchaser > 




<Produot> 




«;SKXJ/> 




<Description/> 




<Drice/> 




</ Product > 




< /Invoice > 





fig.6 
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801 ^ 

J^CONTEXTJ&P b SYMMETRIC JDBSIONATIOir 4 SOURCE + TARGET 

802 

5y>fltfETRIC_Z3E5XGNATJON a Indicates whether a TA2.QBT => SO0RCJJ mapping is 
also iiqplied 



803- 



804, 



'OURCE * 3C3SMA CONTEXT 



800 



vJARGBT a SC&BMA CONTEXT 

805^. _ 

SCHEMA COP/TEXT a PARHNT KiBMENT CONTEXT + VBLJMITSR + SLJSttKNT NAMB 

806^. 

]^UUOT_£XJ2MEOT_CaNT£2fT « SCHEMA CONTEXT of the element's parent fir" one 
exists J 

807 

v sJ2ffLXMTTflR = some Jcnown character value that doesn't appear in the 
ELEMENT NAME b that make up this context 



FIG. 8A 



<Map symmetric =" true" > 




<Source > Cue tomerlnf o< /Source > 




< Targe t > Invo i ce / Pur chaaer< / Taxge t > 




</Map> 


m 


<Map Bytametric- ,t false "> 




<Sour ce> Incident /Suspec t < /Souxce> 




<Targe t >Auct ionCompany/Auct ion b / Sel le r< /Targe t > 




</Map> 





FIG. SB 
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901 
902 
903 

904 



900 



^CaUPARXSQNJiLQOSaTBX = AUGORI THMJJAME + IKPLEMENTATION_REFERSNCE [+ 
IMPLEMENTAJON__ PARAMETERS ] 

*** *>JUiQORXTBM_JMlSB » a unique way to identify this algorithm from other 
algorithms. 

**JMPLBNZNTATXON_RBPBRBiTCB - a means to identify an implementation of this 
algorithm. This may include, hut is not limited to clasa name, function 
call names, and dynamically loadable libraries. 

^ ^JMPLSKSNTAI02f_PAHAMBTSR3 => a set of parameters used to configure this 
specific instance of implementation 



FIG. 9 A 



<STRXNG_DIFFERKNCE cXasBa n com. company. comparisons. StringDiff Score" /> 

cSOUNDEX claas=" com. company. cotnpari sons. SoundexScore"/> 

< NAME_S YNONYM classa " com . company . compari sons . SynonymScore " > 
<SXMIAR degree - »0. 9 n > 

<ELEMENT>Robert< /ELEMENT > 

< ELEMENT ;>Bob </ ELEMENT> 
<BLEMKNT>Rob</ELEKENT> 

< ELEMENT > Bobby < /BLEMENT> 

< ELEMENT >Robby< ELEMENT > 

</SrMIIAR> 910 
<SIMLAR degree-^^O. 85" > 

<ELEMENT>John</ELEMENT> 

< ELEMENT >Johnny</ELEMENT> 
<ELEMENT>Jon</ BLBMENT> 
<ELEMENT>Juan</ELKMENT> 
cBLEMENT>Jack<EliEMENT> 

</SIMILAR> 
</SYNONYM> 



FIG. 9B 
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1000 



TABLE OF COMPARISON TYPES ' 
USED IN STRATEGIES OF TTE 




Inputs Received 


Success Indicators 


Context 
Comparison 


Source schema context 
Target schema context 


Existence of mapping specification (including 
any symmetric versions) is found, using UDMS. 


Element 
Comparison 


Two clement names. 

Name Comparison Algorithm. 

Normalized threshold score. 


Calling the Name Comparison Algorithm with 
the two element names results in a normalized 
score equal to or greater than the threshold score. 


Attribute 
Comparison 


Two attribute values. 
Attribute Comparison Algorithm. 
Normalized threshold score. 


Calling the Attribute Comparison Algorithm 
with the two attribute values results in a 
normalized score equal to or greater than the 
threshold score. 


Datatype 

Lineage 

Comparison 


Two Datatype Karnes. 
Reference to Lineage Comparison 

Algorithm that is registered with SSS. 
Normalized threshold score. 


Calling the Lineage Comparison Algorithm with 
the two Datatype Names results in a normalized 
score equal to or greater than the threshold score. 


Datatype 

Tree/Structure 

Comparison 


Two hierarchical data structures. 
Reference to Tree Comparison Algorithm 

that is registered with SSS. 
Normalized threshold score. 


Calling the Tree Comparison Algorithm with the 
two hierarchical data structures results in a 
normalized score equal to or greater than the 
threshold score. 



FIG. 10 
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1101. 
1102 
1103 

1104. 

1105 
1106 
1107 

1108 

tm 

1110 
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1100 



TvJTTJJ « (STRATEGY) * 

'TRATSOY = ( COMPARISON JTYPB) * 
^ ^COMPARISON JTYPB « CONTEXT^COMPARB \ ELEXENT_COMPARE | DATATYPE JXJMPARE \ 

attributeJcompars 

^LcoitTBXTjCOMPARS - determines if a map exists in the CJser -Defined Mapping Services 
/or two SCHEMA CONTEZT b (including a symmetric version) . 

pSI^MKNT_COMPARff = NAME_Ct^AKTSOW_A£C»J?rTHM' + THRESHOLD 

^ kATTRZSUTSJCOMPARS s ATTRIBUTB^NAME + NAME^CCWPARJSO^ALGOHJTHM + THRESHOLD 

^NAMBJZ 0NPARISQ1T_ALG0SI ITEM «* a comparison algorithm registered in the Similarity 
Scoring Services that compares two ELEMENTS JNAMRS or two ATTRIBUTBVALUES and 
returns a normalised score. 

X ^MTPB_CCWPARff * LZNEAGB_ COMPARE | CmXJ>_SraUCTT^_<X!MPARE 

V sLINKMtB_C0M?AR5 = LINZAG8_C0MPARIS0NJUjGORITHM + THRESHOLD 

*n ^IHBA(7BjCQli?ARI3Qir__ALQ0RXTElt = a conjparieon algori thm registered In the 

Similarity Scoring Services that compares datatypes and returns a normalised 
score that indicates proxisnity of the data types are in their family tree. 



•tfrLP STRUCTURS COMPARE = TREB COMPARISON ALGORITHM + THRESHOLD 



l^PJtSBJlOMPARIBOirjMtaORZTSM « a conparison algorithm registered in the Similarity 
Scoring Services that ccanpares two data hierarchies and returns a normalized 
score based an the similarity of their child structures . 

^ ^THRESHOLD m a normalized score indicating siinilarity or proximity. 



FIG. 11 
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«TTE> 






< STRATEGY > 






<MAP/> 






</ STRATEGY > 






<STRATBGY> 






<ELKMENT comparer" exact" threshold* "1.0"/> 






< DATATYPE compare'' lineage" threshold*"! .o n /> 






^ATTRIBUTE value* "description" compare*" exact* thresholds"!. 0"/> 




</STRATBGY> 






<STRATEGY> 






< ELEMENT compare** " exact" threshold-"! . 0" /> 






<DATATYPE compare- " lineage" threshold- 1 *!. 0"/> 






<ATTRIBUTK value*" da scription" compare- "string^diff" 


threshold*" 0. 




</ STRATEGY > 






< STRATEGY > 






< ELEMENT compares "exact • threshold*" 1 .0"/> 






<DATATYPE corapare= "lineage" threshold**" L. 0"/> 






</STRATEGY> 






<STRATBGY> 






<ELBMBNT compare-"exact" thresholds "LO 1 * /> 






<DATATYPE compare-" lineage - threshold-" 0.5"/> 






<ATTRIBUTE values" description" corapare-'exact* threshold-*!. 0" /> 




< /STRATEGY > 






<STRATEGY> 






<EIiEMBNT compare- n exact" threshold-" 1 . 0 9 /> 






<DATATYPE compare- "lineage" threshold="O.S l, /> 






< ATTRIBUTE value- "description" compare="Btring_ dif f " 


threshold-" 0. 


8"/> 


</ STRATEGY > *~ 






<STRATB<3Y> 






< ELEMENT compare-" exact" threshold*"!. 0"/> 






<DATATYPE compare- "lineage" threshold-" 0 . 5"/> 






c/STRATEGY> 






<STRATBGY> 






cELEMENT compare-" at ring_diff" threshold- "1.0*/> 






<DATATYPE compare- "structure" threshold- " 1. 0 ■ /> 






< ATTRIBUTE values" description* compare- " string — dif f " 


threshold-" 0 . 


8"/> 


</ STRATEGY > 






</rTB> 







FIG. 12 
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Yes- 



1316 
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elements. 
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Success = 
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f Return Success W 
V value. J 



FIG. 14A 



09/17/2003, EAST Version: 1.04.0000 



Patent Application Publication Feb. 21, 2002 Sheet 19 of 20 US 2002/0023097 Al 





FIG. 14B 



09/17/2003, EAST Version: 1.04.0000 



Patent Application Publication Feb. 21, 2002 Sheet 20 of 20 US 2002/0023097 Al 




Load referenced Lineage Comparison 
Algorithm from SSS. 



Load referenced Tree Comparison 
Algorithm from SSS, 



1426 



Pass element name values from 
Source and Target elements into 
Uncage Comparison Algorithm. 



1429 



Pass attribute values from Source and 
Target elements into Tree 
Comparison Algorithm. 



1434 



Success - false. 



^435 

Return Success value. \+ 




Set active comparison to next 
comparison in series. 
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SYSTEM AND METHOD FOR SHARING DATA 
BETWEEN HIERARCHICAL DATABASES 

CROSS REFERENCE TO RELATED 
APPLICATIONS 

[0001] Referenced-applications 

[0002] This application claims the benefit of U.S. Provi- 
sional Application 60/214,891, filed Jun. 29, 2000. 

BACKGROUND OF INVENTION 

[0003] The present invention relates generally to database 
management systems. More particularly, the invention is a 
computer-implemented method that allows data in different 
databases, which may have different formats and structures, 
to be shared without remodeling the data. The system and 
method provides for transforming one hierarchical data 
structure to another hierarchical data structure. 

[0004] Information resources often comprise huge data- 
bases that must be searched in order to extract useful 
information. One example of this includes data found on 
global information networks. With the wealth of information 
available today, and its value to businesses, managing infor- 
mation effectively has become a priority. However, existing 
database technologies, including recent advances in data- 
base integration, are often constrained when interacting with 
multiple, voluminous data sources. 

[0005] As a growing number of companies establish Busi- 
ness-to -Business (B2B) and Business-to- Consumer (B2C) 
relationships using a global communications network, such 
as the Internet, traditional data sharing among multiple large 
data sources has become increasingly problematic. Data 
required by businesses is often stored in multiple databases, 
or supplied by third party companies. Additionally, data 
sharing difficulties are often magnified as companies attempt 
to integrate internal and external databases. As a result, 
combining data from separate sources typically creates an 
expensive and time-consuming systems integration task. 

[0006] A major problem in data exchange arises from 
attempting to apply data associated with one structure, to 
another data structure. Table 1 shows two differing hierar- 
chical data structures. A hierarchical data structure usually 
contains root, interior and leaf nodes. Each node in the data 
structures may contain data, or the data may only be 
contained only in the lowest level nodes, referred to as leaf 
nodes. 



TABLE 1 



Structure A (with data) 


Structure B (without data) 


Suspect 


Offender 


Name 


Identification 


First - "John" 


Name 


Middle - "Q" 


Address 


Last - "Public" 


StreetNum 


Address 


StreetName 


Street = 123 "Main" 


City 


City - "AnyTown" 


State 


State - "TX" 


ZipCode 


Zip - "02334" 





[0007] In order to facilitate the exchange of data, current 
solutions include standards bodies and consortia that stan- 



dardize data structure. Standards bodies like RosettaNet, 
BizTalk, OASIS, and ACORD attempt to standardize data so 
that it can be exchanged more easily. However, there are 
problems presented by these solutions. To participate in a 
consortium, all participants' data has to be modeled in the 
same manner. Additionally, consortia and standards bodies 
established to handle similar types of data often have 
different standards for specific industries. The adoption of 
standards is also slow, because businesses within each 
industry still modify data to fit their own company require- 
ments. Hence, given the number of different consortia, 
standards, and industries, there is still a need for a standard 
means to exchange data and data structure between different 
data structures and databases, among companies of the same 
and different industries, and even among departments of the 
same companies. 

[0008] One current approach to filling this need is to 
painstakingly map one field of data to another, in order to 
exchange the data with a "non-conformant" entity; that is, 
one that uses different data structure standards. This process 
must be repeated not only for every field but also for every 
different exchange. These solutions to the exchange problem 
are generally custom "hard-coded" solutions. An efficient, 
user-configurable method for sharing data between different 
data structures, by transforming one hierarchical data struc- 
ture to another, is still lacking. 

[0009] Technologies such as Structured Query language 
(SQL), Open Database Connectivity (ODBC) and Exten- 
sible Markup Language (XML) have been developed to 
facilitate data integration. As beneficial as these technolo- 
gies may be, however, they have failed to address inherent 
differences in the structure and organization of databases, in 
addition to the contents. These differences are important, 
because the richness of the original structure often contrib- 
utes to the value of its underlying data. 

[0010] For example, when attempting to store the same 
type of data or object, such as a customer description, 
database designers may use different field names, formats, 
and structures. Fields contained in one database may not be 
used in another. Or data that is stored in a single field in one 
database may be stored in several fields in another. If 
understood and logically integrated, these disparities can 
provide valuable information, such as how a company gains 
competitive advantage based on its data structuring. Unfor- 
tunately, today's database technologies often cleanse the 
disparities out of data to make it conform to standards of 
form and structure. Examples include databases that are 
converted from one representation to another representation 
and expressed in XML, using its corresponding hierarchical 
structure. 

[0011] Integrating data from multiple environments and 
formats into a single interoperable structure is particularly 
necessary to seamless B2B electronic commerce (e-Com- 
merce), and XML enables data to look much more alike than 
any previous format. However, there are still problems with 
using XML to represent data. These problems fall into two 
major categories: L) dirty and naturally occurring data 
perplex XML searching and storage and 2.) data formats or 
data schemas in the original databases that offer competitive 
advantage or better reflect the true model of the business and 
its data, are sacrificed to standards consortia. This means that 
the database formats or schemas have to be fit into the 
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consortia data standards, which requires a highly skilled 
technical staff to spend a large amount of time comparing' 
one database schema to another. Moreover, the standards 
being used and developed to overcome these data exchange 
barriers sacrifice competitive advantage for interoperability. 
Today, businesses require both. 

[0012] Conforming to industry standards may also raise 
another of other issues, such as intellectual property issues; 
the ability for data modeled to a specific consortium stan- 
dard to communicate with other consortia that use a different 
model or standard; and the handling of legacy data in 
multiple formats. 

SUMMARY 

[0013] The present invention solves the aforementioned 
needs, by providing a system and method for data sharing, 
without requiring that the data be remodeled to fit a common 
format or convention. Data can be dynamically transformed 
from any hierarchical structure to any other, regardless of 
format. 

[0014] The present invention is a method for sharing data 
betwee n hierarchical databases, co mprising denning, con - 
figuring and storing datatypes, defining, configuring an d 
s toring hierarchical data structures comprising the datatype s, 
e stablishing and storing a lineage for linking related 
d atatypes into families, denning, configurin g and storing 
measures of similarity and similarity matcb toleran ces, 
defining, confi guri ng and storing match strategies^ tra ns- 
fo rming a source hierarchical data structure to a target 
h ierarchicaL data structure h v deiui i iiiuiug the simila rity 
b etween the source and target data structure, and evaluat ing 
a n effectiveness indicia of match strategies. Th e method may 
fu rther comprise manual ly defining, configuring and stor ing 
Mjppfagst"^ 660 dat atype elements! ~~ 

[0015] The present invention also provides a user-config- 
urable "tree transformation" system and method that 
employs a step -by -step process of ehmination to take the 
contents of one hierarchical data structure and apply them to 
a different structure. It allows for the use of a "dictionary" 
of common datatypes, which establishes a relationship hier- 
archy between datatypes so that datatype lineage may be 
used to facilitate the tree transformation process. The present 
invention has a user-definable "string similarity" comparator 
to establish the similarity of two strings, which may be used 
to facilitate the tree transformation process. It has a user- 
definable "structure similarity" comparator to establish the 
similarity of tree structures, which may be used to facilitate 
the tree transformation process. The present invention also 
has user-definable element pairing maps, which may be used 
to facilitate the tree transformation process. 

[0016] The invention provides a computer-implemented 
method for applying data from a first hierarchical data 
structure to a second hierarchical data structure, comprising 
"SeiYfrp a_s ource el e ment con t aining data from the firs t 
hierarchical data structure and a target element trom_l he 
second hierarchical data structure, which is to contain t he 
transformed data. It is determined whether the source ele- 
ment and target element have any child elements. Where the 
source element has no child elements and the target element 
has no child elements, the data from the source element is 
copied to the target element. Where the source element has 
no child elements and the target element has at least one 



child element, the data contained by the source element is 
separated and applied to the at least one target child element. 
This may be accomplished via a best-fit algorithm, and the 
source element data may be separated into tokens that are 
applied to the target child elements. 

[0017] Where the source element has at least one child 
element and the target element has no child elements, the 
data on the at least one child element of the source element 
is combined into one value and the value is applied to the 
target element. Where the source element has at least one 
child element and the target element has at least one child 
element, it must be determine whether a source child ele- 
ment matches an unfilled target child element. This deter- 
mination may comprise setting a- source child pointer to a 
first source child element and determining if the first source 
child element and an unmarked target child element satisfy 
a first match strategy. Where the first match strategy is 
satisfied, the target child element is marked and the overall 
invented method reiterated by receiving the first source child 
element as the source element and the marked target child 
element is received as the target element. Where the first 
strategy is not satisfied, it is determined whether at least one 
additional source child element exists. Where at least one 
additional source child element exists, the source child 
pointer is set to a next source child element and the step of 
determining whether each child element of the source ele- 
ment matches an unfilled child element of a target element 
is reiterated. 

[0018] Where no additional source child elements exist, it 
is determined whether at least one additional strategy exists. 
Where at least one additional strategy exists, the step of 
determining whether each child element of the source ele- 
ment matches an unfilled child element of target element is 
reiterated, using a next strategy. Where no additional strat- 
egies exist, a message is returned, indicating that no match 
is available between the first source child element and the at 
least one child of the target element. 

[0019] Where such a message is returne d, the user ma y 
explicitly define at least one element match between at leas t 
one source element and at least one target element, via a 
use r-derin able mapping services facility. 

[0020] Where a source child element matches an unfilled 
target child element, the data of the source child element is 
applied to the unfilled target child element. The steps of the 
method are reiterated, until all elements of the second 
hierarchical data structure have been traversed. 

[0021] Strategic s n^ay hr """^ ''reo rder of decreasin g 
accuracy and may_be stored in and retrieved from a Sim i- 
la rity Score Services facil ity. Auser may defi ne the accura cy 
o fa match strategy. A match s trategy co mprises at least on e 
comparison utilitv. each comparison utility' chosen i'rom a 
group consisting o f a context comparison utility, an eleme nt 
comparison utilitv, an a ttrifruje comparison utility, a lineag e 
d atatype comparison utilitv. and a tree datatype compariso n 
u tility, 

[0022] The current invention is also directed to a softwa re 

program embodied on a computer- readable medium, in cor- 

ppyalliig the invented m ethod. ' " 
^* i — ■ — - 

[0023] Th e current invention is also directed to a com- 
puter-based system tor applying data trom a first hierarc hi- 
cal data structure to a second hierarchical data structure. The 
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system c omprises a means for receiving at leastone sou rce 
el ement lro m th e rirst hierarchical d ata structure an d at fea st 
o ne target element from the second hierarchical data struc- 
ture, a means for determining whether source elements a nd 
t arget elements have child elements, a means for copyi ng 
dat a from a source elementlo a target element, a means for 
s eparating data from a source element and applying the da ta 
to at least one chi' l fi nf •» t^rppt ^^1^ a m r^fini, for 
comparing a child of a source element to a child of a target 
element and determining a match, and a means for copying 
data from a source child element to a target child element, 
wher e a match is det ermined . 

[0024] The system may further comprise a means for 
receiving datatypes from a user and for allowing the user to 
configure and define the datatypes. The system may further 
comprise a means for receiving explicit mappings that match 
at least one source element to at least one target element 
from a user for allowing the user to configure and define the 
mappings. The system may further comprise a means for 
storing at least one match strategy for allowing the user to 
configure and define the at least one match strategy. 

BRIEF DESCRIPTION OF DRAWINGS 

[0025] These and other features, aspects and advantages of 
the present invention will become better understood with 
regard to the following description, appended claims, and 
accompanying drawings wherein: 

[0026] FIG. 1 is an architecture diagram of the present 
invention; 

[0027] FIG. 2 is an example of an embodiment of a typical 
formal data type specification of the present invention; 

[0028] FIG. 3A illustrates the Read Process of the Data 
Type Services facility; 

[0029] FIG. 3B illustrates the Write Process of the Data 
Type Services facility; 

[0030] FIG. 3C illustrates the Delete Process of the Data 
Type Services facility; 

[0031] FIG. 3D illustrates a first Locate Process of the 
Data Type Services facility; 

[0032] FIG. 3E illustrates a second Locate Process of the 
Data Type Services facility; 

[0033] FIG. 4A illustrates a first example of data type 
families; 

[0034] FIG. 4B illustrates a second example of data type 
families; 

[0035] FIG. 5A illustrates an example of XML data types 
for the first example data type family shown in FIG. 4A; 

[0036] FIG. 5B illustrates an example of XML data types 
for a more complex data type than that shown in FIG. 5A; 

[0037] FIG. 6 illustrates sample schemas defined in XML; 

[0038] FIG. 7A illustrates the Read Process of the User 
Defined Mapping Services Facility; 

[0039] FIG. 7B illustrates the Write Process of the User 
Defined Mapping Services Facility; 

[0040] FIG. 7C illustrates the Delete Process of the User 
Defined Mapping Services Facility; 



[0041] FIG. 7D illustrates the Locate Process of the User 
Defined Mapping Services Facility; 

[0042] FIG. 8A illustrates a formal user defined map 
specification; 

[0043] FIG. 8B illustrates a sample of user-defined map- 
pings defined in XML; 

[0044] FIG. 9A illustrates a formal definition of a Simi- 
larity Scoring Service configuration specification; 

[0045] FIG. 9B illustrates an example XML similarity 
scoring service configuration; 

[0046] FIG. 10 illustrates a table of comparison types, 
their required inputs, and their success indicators; 

[0047] FIG. 11 illustrates an example of a formal tree 
transformation engine configuration specification; 

[0048] FIG. 12 illustrates an example tree transformation 
engine configuration defined in XML; 

[0049] FIG. 13A illustrates a first portion of a flow 
diagram of a tree transformation Process; 

[0050] FIG. 13B illustrates a second portion of the flow 
diagram of FIG. 13A; 

[0051] FIG. 14A illustrates a first portion of a flow 
diagram of strategy evaluation; 

[0052] FIG. 14B illustrates a second portion of the flow 
diagram of FIG. 14A; and 

[0053] FIG. 14C illustrates a third portion of the flow 
diagram of FIG. 14A. 

DETAILED DESCRIPTION 

[0054] Please note that within this document, the term 
hierarchy and tree are used interchangeably but both refer to 
the same concept. Additionally, please note that every ele- 
ment in a tree can have zero to N (0 ... N) number of 
children, and every child in the tree has one parent. The root 
element of a tree has no parent. 

[0055] FIG. 1 illustrates an overview of the architecture of 
the present invention. A Data Dictionary Service (DDS) 100 
acts as a single -point of access, with which users can 
configure and access the sub-services that provide the func- 
tions of the Tree Transformation Engine (TTE) 101. The 
TTE 101 is a user-configurable facility that employs a 
step-by-step process to enable accurate transformation of 
one hierarchical data structure (a "Source tree") to another 
hierarchical data structure (a "Target tree"). The resultant 
tree has the structure of the Target tree but is populated with 
elements from the source tree. The TTE 101 systematically 
iterates through the data elements of the Source tree and 
attempts to find a best match pairing with data elements in 
the Target tree. At each level of the Source tree, the TTE tries 
a best match strategy first, then successively tries match 
strategies having diminishing accuracy, until all match strat- 
egies are exhausted. The strategies and ordering of impor- 
tance and accuracy of strategies is user-definable. When a 
pairing is found that meets the requirements of the match 
strategy being employed, the pair of Source tree and Target 
tree data elements are fed back into the TTE 101. The 
matching process is run recursively until the entire Target 
tree has been traversed, resulting in ' a Target tree that 
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contains the data elements comprising data from the Source 
tree. The TTE 101, as explained below, uses the other 
services of the DDS 100 to transform data from one hier- 
archical structure to another. 

[0056] The Datatype Services facility 102 provides defi- 
nition and storage of datatypes. Datatypes act as building 
blocks, with which users may 1) define and configure other 
datatypes, 2) define and configure hierarchical data struc- 
tures, and 3) establish lineages that link related datatypes 
into families. Such a lineage may also be also called an 
"inheritance model." During the tree transformation process, 
families of datatypes play a role in determining pairing of 
data elements. 

[0057] It may sometimes be unclear how to relate certain 
datatypes into appropriate families, using the Datatype Ser- 
vices facility 102. During the tree transformation process, 
then, it may be impossible to determine how some data 
elements should be paired. The User-Defined Mapping 
Services facility 103 alleviates this difficulty. The User- 
Defined Mapping Services facility 103 allows custom con- 
figuration of data element maps, so that the pairing of data 
elements may be explicitly defined, as necessary. 
[0058] The Similarity Scoring Service 104 allows for the 
configuration and registration of similarity scoring measures 
that can compare two objects and return a score based on 
their similarity. The measures of similarity and any match 
tolerances to be applied to certain match strategies used by 
the TTE 101 may be defined and configured by the user. 
[0059] FIG. 2 illustrates an example embodiment of a 
typical formal data type specification 200 that may be 
defined and configured for the Datatype Services facility 
described with reference to FIG. 1. A datatype is a named 
entity that describes data structure. A datatype is formally 
defined in FIG. 2 at 201, as the combination of a Datatype 
Name 202, a Parent Datatype Reference 203, and an Ele- 
ment 204. The Datatype Name 202 uniquely identifies the 
datatype and distinguishes it from other datatypes. The 
Parent Datatype Reference 203 is a reference that indicates 
that the datatype being defined is a child of the parent 
datatype being referenced in the Parent Datatype Reference 
203. 

[0060] The Element 204 is a combination of an Element 
Name 205, a Datatype Reference 206, a Positional Refer- 
ence 207, an Alias Name 209, and the data Element 204 
itself. The Element Name 205 identifies the element and 
distinguishes it from others. A named datatype 201 can 
specify its structure by explicitly listing its child elements, 
by specifying a Datatype Reference 206 that indicates that 
the element's structure is the same as the referenced 
datatype, or by a combination of both. Where a Datatype 
Reference 206 is used, the Datatype Reference 206 refer- 
ences another datatype, using the name of the referenced 
datatype. The Datatype Reference 206 indicates that the 
child structure of the named Element 204 is equal to the 
child structure of the datatype that is referenced by the 
Datatype Reference 206. A Datatype Reference 206 found 
on an Element 204 of a datatype indicates that the Element 
204"includes" all of the structure of the referenced datatype. 
This means that all of the child elements of the parent 
datatype are implicitly present in the named Datatype 201, 
without the user having to explicitly specify them. 
[0061] The Positional Reference 207 may also comprise 
an Element Reference 208, which is a reference to a child of 



the datatype specified in the datatype reference of the current 
element's parent. Finally, the Alias Name 209 is a reference 
to a child element of the datatype specified in the datatype 
reference of the current element's parent. The specified 
value indicates that Element Name 205 replaces the element 
referred to by the Alias Name 209. 

[0062] FIGS. 3A-3E are flow diagrams of the processes 
included in the Datatype Services facility described with 
reference to FIG. 1. The Datatype Services facility provides 
a means to manage individual datatypes. It provides the 
functions to read, write, delete, and locate datatypes as 
depicted in FIGS. 3A-3E. 

[0063] FIG. 3A illustrates at 300 the Read Process of the 
Datatype Services facility described with reference to FIG. 
1. In accordance with step 301, the name of a datatype that 
is to be retrieved is input to the Datatype Services facility. 
In accordance with step 302, it is determined whether the 
datatype exists. If the datatype does not exist, then an error 
is returned, in accordance with step 303. If the datatype 
exists, then the datatype is returned, in accordance with step 
304. 

[0064] FIG. 3B illustrates at 310 the Write Process of the 
Datatype Services facility. In accordance with step 311, th e 
d atatype that is to be saved is input to the Datatype Services 
f acility. In accordance with step 312, it is determine_d 
whether the datatype already exists in a list of datatypes 
maintained by the Datatype Services facility. I f the datatyp e 
d o*eTTioTe xlsl, IhcU the - datatype b utlded to the list, in 
a ccordance with step 313. If the datatype already exists, the n 
a n error is returned, in accordance with step 3l4. 

[0065] FIG. 3C illustrates at 320 the Delete Process of the 
Datatype Services facility. In accordance with step 321, the 
name of the datatype that is to be deleted is input to the 
Datatype Services facility. In accordance with step 322, it is 
determined whether the datatype exists in a list of datatypes 
maintained by the Datatype Services facility. If the datatype 
does not exist on the list, then an error is returned, in 
accordance with step 323. If the datatype does exist, then the 
datatype is deleted from the list, in accordance with step 324. 

[0066] FIG. 3D illustrates at 330 a first locate process of 
the Datatype Services facility, in which it is determined 
whether two datatypes are of a common family. In accor- 
dance with step 331, names for two datatypes are input. In 
accordance with step 332, it is then determined whether the 
datatypes exist. If they do not, then an error is returned, in 
accordance with step 333. If the datatypes do exist, then the 
levels of the tree of the first datatype are stepped through in 
a backwards (upwards) progression, in accordance with step 
334, until the root of the first tree is reached. Next, this 
stepwise procedure is performed for the tree of the second 
datatype, in accordance with step 335. After the roots of both 
trees are reached, it is determined whether the roots are the 
same, in accordance with step 336. If the roots are not the 
same, a message indicating that the two datatypes are not of 
same family is returned, in accordance with step 337. One 
example of such a message is "false." If the roots are the 
same, a message indicating that the two datatypes are of the 
same family is returned, in accordance with step 338. One 
example of such a message is "true." 

[0067] FIG. 3E illustrates at 340 a second location pro- 
cess of the Datatype Services facility, in which it is deter- 
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mined whether two datatypes share a common ancestor. In 
accordance with step 341, names for two datatypes (repre- 
sented as data type 1 and data type 2) are input. In accor- 
dance with step 342, it is then determined whether the 
datatypes exist. If they do not, then an error is returned, in 
accordance with step 343. If the datatypes do exist, then an 
active datatype setting is set to datatype 1, in accordance 
with step 344. In accordance with step 345, it is determined 
whether the active datatype is an ancestor of datatype 2. If 
the active datatype is an ancestor of datatype 2, then the 
active datatype is returned, in accordance with step 346. If 
the active datatype is not an ancestor of datatype 2, such as 
in the first iteration of the process (where the active data type 
is set to data type 1) and possibly in others, then it is 
determined whether the active datatype has a parent, in 
accordance with step 347. If the active datatype has no 
parent, then an error is returned, in accordance with step 

348. If the active datatype has a parent, then the active 
datatype setting is set to the parent, in accordance with step 

349, and the common ancestor location process continues 
with an additional iteration of steps 345-349. 

[0068] FIGS. 4A and 4B illustrate examples of how data 
may be interrelated to create datatype families. FIG. 4A 
shows at 400 the interrelation of data to create a Name 
Family. A Simple Name datatype 401 that identifies an 
individual or business apart from other objects in a data 
structure may be related to more specific data. Thus, the 
Simple Name datatype 401 may have descendants, such as 
a Compound Name datatype 402, which contains a first, 
middle, and last name for the individual; or a Business Name 
datatype 403, which contains only the name of the business. 
A Compound Name datatype 402 for an individual may be, 
in turn, a parent to more specific data, such as a Complex 
Name datatype 404. The Complex Name datatype 404 is 
made up of Prefix and Suffix explicitly and also contains 
First, Middle, Last implicitly. This is because the Complex 
Name datatype 404 is an instance of the Compound Name 
402 and "includes" structure 402. Another descendant of the 
Compound Name datatype 402 may be a Formatted Name 
datatype 405, which separates the name into given names 
and surnames. 

[0069] FIG. 4B shows at 410 the interrelation of data to 
create an Address Family. A Simple Address datatype 411 
that identifies an individual or business address apart from 
other objects within a data structure may be related to more 
specific data, such as a Compound Address datatype 412, 
which contains a street address, city, state, and zip code for 
the individual or business. The Compound Address datatype 
412 may be, in turn, related to more specific data, such as a 
Complex Address datatype 413, which separates the street 
address from the Compound Address datatype 412 into a 
street number, a street name, and a street direction. 

[0070] If renaming child elements is important, the use of 
aliases allows derivative datatypes to rename certain child 
elements. By using the ALIAS_NAME designation, as 
described with reference to FIG. 2, the "reintroduced*' 
element can change an ELEMENT_NAME, by maintaining 
an ELEMENT_NAME reference to the child element of the 
parent data type. This process is evident in FIG. 4B, for 
example, where the Canadian Address datatype 414 reintro- 
duces "State" and "Zip" as "Province" and "Postal Code". 
By using the aliases State and Zip, it maintains reference to 
the old elements in Compound Address datatype 412 and 



creates an implicit element pair. This facilitates an element 
pairing process, when transforming one data type structure 
to another in the same family. 

[0071] The embodiments shown in FIGS. 4A and 4B are 
given for illustrative purposes only, and are not intended to 
limit the scope of the current invention to certain applica- 
tions. It will be recognized by those skilled in the art that the 
invention is susceptible of other applications and purposes, 
without departing from the invention as a whole. 

[ P072] An example of how these relation ships may be 
expr essed in a text-based markup lang uage, sucn as Exten- 
sibl e Markup Language (XM L) , is illustrated~in FIGS. 5A 
a nd XML is used as an example throu g hout, in orde r to 
il lustratively explain certain concepts. Represe ntation of a 
data structure hierarchy in XML is a natural fit because XML 
is itself a language used to define hierarchies. However, any 
language suitable for representing the relationships in a data 
structure may be used, without departing from the scope of 
the current invention. Ot her textual markup languages suc h 
as SGML and general obiect -oriented practices of com po- 
siti on may be used, wherein an object can c ontain oth er 
objects , which, in turn contain other objects Th er eby creati ng 
a Hierarchy of nested objects. " 

[00 73] The textual representation 500 in FIG. 5A, illu s- 
trates even turther the hierarchical structure of the Nam e 
F amily described with reference to FIG. 4A, and the incl u- 
sion ot parent datatypes within their descendant dat atypes. If 
trre"oraer or aesce ndants of a datatype is significant, a us er 
can 

[0074] a) re-specify all of the parent datatype's ele- 
ments, along with its own, in the proper order. For 
example in XML: 



<ComplcxNarnc instanccOf = "CompoundNamc"> 

<Prefix/> 

<First/> 

<Middle/> 

<Last/> 

<Suffix/> 
</ComplexName> 



[0075] b) specify a POSIT10NAL_REFERENCE, as 
described with reference to FIG. 2, that indicates 
before which element this element is inserted. For 
example in XML, the POSITIONAL_REFER- 
ENCE'Tirst" is added to the element "Prefix", to 
indicate that the Prefix will be inserted before the 
First name: 



<CompIexName instanccOf - "CompoundName"> 
* <Prefix insert ■ "First*V> 
<Suffix/> 
</Compl«cName> 



[0076] In example a) , all elements are re-specified in 
p roper order. In-e xamp leT^, all introduced ele ments wttti3e 
appended after existing elements. First, Middle, and Cast are 
already existing prior to their inclusion in the Complex 
Name datatype. Thus, Suffix doesn't need an insert attribute. 
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[0077] The Data Type Services facility may act as a 
repository from which users may build complex datatypes 
that include datatypes from other families, as shown in FIG. 
5B at 510. For example, a user may define a new data type 
called "Person" which introduces a new family. The Person 
data type may be made, for example, from a Compound 
Name data type and a Compound Address data type, both 
from other datatype families, as well as new datatypes, such 
as a Date data type that represents the person's date of birth, 
and a SSN data type that represents the person's social 
security number. 

[0078] Once a dictionary of data types has been estab- 
lished, the Data Type Service can act as a repository of 
known data types, from which users can build schemas like 
those shown in FIG. 6 at 600. A schema is a logical 
representation of a data hierarchy. In many respects, a 
schema may be thought of as a higher level version of a 
single datatype. However, though a data type acts as a 
building block, and hence defines structure, the data type 
will not ultimately contain data. A schema, on the other 
hand, is used to model real-world data. 

[0079] FIG. 6 shows example schemas that are defined in 
XML, as illustrations of how schemas may be defined and 
structured by incorporating and arranging datatypes from 
various families. For example, Schema A uses Name, 
Address, and a host of other identifying information to 
define a data structure for Customer Information about a 
particular customer. Rather than use all the information of 
the Name Family shown in FIG. 5A, for example, only the 
Compound Name and its data elements are used in the 
hierarchy of Schema A in FIG. 6. However, the Complex 
Name is used for Schema B. Similarly, only the Compound 
Address of the Address Family shown in FIG. 4B is used in 
Schema A for the address portion of the Customer Informa- 
tion. However, the Canadian Address is used in Schema B. 
Thus, various schemas can be developed using individual 
data types from various families. Schemas may be defined 
using other languages and representations than XML, and 
they may be defined and structured differently than the 
examples shown in FIG. 6, without departing from the 
scope of the current invention. 

[0080] FIGS^A^O show flow diagrams of the U ser 
Defined [ MapplngServ ices processes, described with re fer- 
ence tnFu^ 7 1 YK £ User Defined Mappmg Service (UU MS) 
facility allows for the storage and retriev al of explicit 
element pairings. I t provides the iScSues^to read,, w rite, 
delete, and locate user-defined mapping s 

[ 0081] FIG. 7A illustrates at 700 the Read Process of t he 
UDmS tac mty. in accordance with step 701, the Source 
schema context and Target schema context ol the u ser 
defi ned mapping spe cification (''mapping") that is to be 
r etrieved are i mmt to t he UuMS tacility. In accordance with 
step 7Ui, it is determined whether the mapping specification 
exists. If the mapping specification does not exist, then an 
error is returned, in accordance with step 703. If the mapping 
specification exists, then the mapping is returned, in accor- 
dance with step 704. 

[0082] FIG. 7B illustrates at 710 the Write Process of the 
UDMS facility. In accordance with step 711, the mapping 
specification that is to be saved is input to the UDMS 
facility. I n ^ accordance with step 712 , it is determine d 
whether the mapping specification already exists in a list of 



mapping specifications maintained by the UDMS facility jf 
the mapping specification does not exist, then the map ping 
s pecification that was input is added to t he list, in accorBance 
w ith step 713. I f the mapping specification already exists, 
then an error is returned, in accordance with step 714. 

[0083] FIG. 7C illustrates at 720 the Delete Process of the 
UDMS facility. In accordance with step 721, the Source 
schema context and Target schema context of the mapping 
specification that is to be deleted are input to the UDMS 
facility. In accordance with step 722, it is determined 
whether the mapping specification exists in a list of mapping 
specifications maintained by the UDMS facility. If the 
mapping specification does not exist on the list, then an error 
is returned, in " accordance with step 723. If the mapping 
specification does exist, then the mapping specification is 
deleted from the list, in accordance with step 724. 

[0084] FIG. 7D illustrates at 730 a Location Process of the 
U DMS facility . In accordance with step 731, th e Source an d 
Target schema contexts for the mapping specification tha t is 
t o be located are input. In a ccordance with step 732, the 
So urce schema context is input into the UD MS ReacTPr o - 
cess described with reference to frltr. 7A . Then the Target 
scnema context is input to the UDMS Read Process. In 
accordance with step 733, it is then determined whether the 
mapping specification exists. If it does, then the mapping 
specification is returned, in accordance with step 734. If it 
does not exist, then the Target schema context is input in to 
the UDMS Read Process, followed by the Source schema 
c ontext. Hence, the two are entered in reverse order, i n 
a ccordance with step 735. In accordance with step 736, it is 
d etermined whether the mapping specification is symmet ric. 
I f it is symmetric, then the mapping specification is re turned, 
in accordance with step 737. If the mapping specification is 
not symmetric, then an error is returned, in accordance with 
step 738. 

[0085] FIG. 8A shows a formal user defined map speci- 
fication 800. The map specification 800 contains a definition 
of a Context Map 801, which includes a Symmetric Desig- 
nation; a Source element, from which data is to be con- 
verted; and a Target element, to which data is to be con- 
verted. The Symmetric Designation 802 indicates whether 
Target to Source mapping is also implied in the context map 
801, The Source 803 and Target 804 elements are both 
defined in terms of their schema contexts. ASchema Context 
805 includes a Parent Element Context, a Delimiter, and an 
Element Name. The Parent Element Context 806 comprises 
the schema context of the Element's parent. Because each 
Parent Element schema context also contains a parent ele- 
ment context of its own, the Source and Target elements will 
be recursively related to all elements in their respective 
schemas, from which they descend. Thus, full schema 
contexts can be seen for the Source and Target elements. The 
Delimiter 807 is a known character value that does not 
appear in any of the Element Names that make up the 
schema context for an element. This allows the Source and 
Target elements to be identified separately from each other, 
where other schema context aspects may appear the same. 

[0086] An example of how user-defined context maps may 
be expressed in XML is illustrated in FIG. 8B which shows 
an example of user defined mappings defined in XML. Other 
languages and representations than XML may be used, and 
user defined context maps may be defined and structured 
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differently than the examples shown in FIG. 8B, without 
departing from the scope of the current invention. First, the 
Symmetric Designation is defined as "true" or "false/* to 
indicate whether Target to Source mapping is also implied in 
the context map. The Source and Target elements are 
defined, each of which relates to a schema context. In the 
example shown in FIG. SB, the Source and Target schema 
contexts relate to the schemas A and B shown in FIG. 6. 

[0087] The Similarity Scoring Service (SSS) of the current 
invention provides users with the ability to register various 
scoring strategies and comparison algorithms with the ser- 
vice. A comparison algorithm may comprise any suitable 
algorithm that compares two objects and returns a score 
based on their similarity. The objects compared may include, 
but are not limited to, strings, trees, and other more complex 
objects. Facilities in the SSS provide a means to add, 
remove, load and execute, and evaluate algorithms con- 
tained the SSS. Once registered, the algorithms may be 
referenced by name. 

[0088] FIG. 9A illustrates a formal SSS configuration 
specification 900, A Comparison Algorithm 901 formally 
includes the Algorithm Name, and Implementation Refer- 
ence, and Implementation Parameters. The Algorithm Name 
902 is a user-defined name that identifies the Comparison 
Algorithm 901 from other algorithms. The Implementation 
Reference 903 may comprise any suitable means for iden- 
tifying a particular implementation for the named Compari- 
son Algorithm 901, apart from other possible implementa- 
tions of the named Comparison Algorithm 901. Suitable 
implementation identifications may include, but are not 
limited to, class names, function call names, and dynami- 
cally loadable libraries. The Implementation Parameters 904 
are a set of user defined parameters that configure the 
identified Implementation 903 of the named Comparison 
Algorithm 901 in the specific instance of use, 

[0089] FIG. 9B shows a sample of an XML similarity 
scoring service configuration 910. For example, in FIG. 9B, 
a Comparison Algorithm having the name NAMEJ5YN- 
ONYM is used. The specific implementation for the algo- 
rithm is identified by the character string " com. company - 
xomparisons.SynonymScore , \ This character string denotes 
a specific implementation of the NAME_SYNONYM algo- 
rithm. The SIMILAR degree is set at 0.9 for instances of the 
first name (Robert), and set to 0.85 for instances of the 
second name (John). In this manner, parameters are set for 
this specific implementation of the NAME_SYNONYM 
algorithm. Other languages and representations than XML 
may be used, and SSS configurations may be defined and 
structured differently than the examples shown in FIG. 9B, 
without departing from the scope of the current invention. 

[0090] The TTE described with reference to FIG. 1 pro- 
vides a facility where the user can enter a source data 
hierarchy that contains data on its elements, and a target data 
hierarchy that contains only structure. The user can then 
expect, as a result, the target structure populated with the 
data from the source structure. The TTE may be configured 
by the user to determine the necessary steps, the order of the 
steps, and the algorithms used, to facilitate the automated 
transformation of hierarchical data from one structure to 
another. 

[0091] The configuration of the TTE can be expressed as 
a series of strategies. Strategies may be ordered within the 



TTE by the user in any suitable way. In one embodiment, the 
strategies are ordered from most accurate to least accurate. 
The accuracy of a strategy may be measured by the number 
of successful comparisons, relative to a total number of 
comparisons performed. 

[0092] A strategy is a collection of comparisons, which 
can take the following forms: Context Comparison; Element 
Comparison; Data Type Comparison; Attribute Comparison. 

[0093] FIG. 10 illustrates at 1000 the various comparison 
types, the inputs required for each, and the indicators for 
evaluating a comparison as successful. The comparison 
types are not listed in any particular order. 

[0094] A Context Comparison takes as input a Source 
schema context and a Target schema context and asks the 
UDMS, described with reference to FIGS. 7A-7D, if a map 
exists for these two contexts (including a symmetric ver- 
sion). The existence of a map specification results in a 
successful comparison. 

[0095] An Element Comparison takes as input two Ele- 
ment Names, a reference to a Name Comparison Algorithm 
registered with the SSS, as described with reference to 
FIGS. 9A-9B, and a normalized threshold score. The com- 
parison evaluates successfully, if calling the Name Com- 
parison Algorithm with the Element Names results in a 
normalized score greater that or equal to the threshold score. 

[0 096] An Attribute Comparison is similar to an Elemen t 
Comparison. Th e Attribute Comparison takes as input tw o 
Attribute Values, a reference to an Attribute Compari son 
Algorithm registered with the SSS, and a normalized thresh - 
old score. The comp arison eval uates successfully, if calli ng 
t he Attribute Comparison Algorithm witfi the Attribu te 
Names results in a normalized score greater that or equal to 
the threshold score. 

[0097] A Datatype Comparison can take one of two forms: 
a Lineage Comparison and a Structure Comparison. 

[0098] A Data Type Lineage Comparison takes as input 
two Data Type Names, a reference to a Lineage Comparison 
Algorithm that is . registered with the Similarity Scoring 
Services, and a normalized threshold score. The comparison 
evaluates successfully if calling the Lineage Comparison 
Algorithm with the two Datatype Names results in a score 
greater than or equal to the threshold. 

[0099] A Datatype Structure Comparison takes as input 
two hierarchical data structures, a reference to a Tree Com- 
parison Algorithm that is registered with the Similarity 
Scoring Services, and a normalized threshold score. The 
comparison evaluates successfully if calling the Tree Com- 
parison Algorithm with the two hierarchies results in a score 
greater than or equal to the threshold. 

[0100] FIG. 11 illustrates a formal specification of a TTE 
configuration 1100. As explained, the TTE 1101 includes at 
least one strategy, and each strategy 1102 includes at least 
one comparison type. Each comparison type 1103 may 
comprise a Context Comparison, Element Comparison, 
Attribute Comparison, or Datatype Comparison, A Context 
Comparison 1104 asks the UDMS if a map exists for two 
schema contexts (including a symmetric version). An Ele- 
ment Comparison 1105 includes a Name Comparison Algo- 
rithm and a normalized threshold score for determining 
similarity. An Attribute Comparison 1106 includes an 
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Attribute Value, a Name Comparison Algorithm, and a 
normalized threshold score for determining similarity. A 
Name Comparison Algorithm 1107 is registered with the 
SSS. It compares two element names or attribute values and 
returns a normalized score. 

[0101] A Datatype Comparison 1108 may comprise either 
a lineage comparison or a child structure comparison. A 
Lineage Comparison 1109 includes a Lineage Comparison 
Algorithm and a threshold score for determining similarity. 
The Lineage Comparison Algorithm 1110 is a comparison 
algorithm registered in the" Similarity Scoring Services that 
compares datatypes and returns a normalized score that 
indicates proximity of the data types are in their family tree. 
A Child Structure Comparison 1U1 includes a Tree Com- 
parison Algorithm and a threshold score for determining 
similarity. The Tree Comparison Algorithm 1112 is a com- 
parison algorithm registered in the Similarity Scoring Ser- 
vices that compares two data hierarchies and returns a 
normalized score based on the similarity of their child 
structures. A threshold 1113, as described with reference to 
the comparison types, comprises a normalized score that 
indicates similarity or proximity. 

[0102] FIG. 12 illustrates an example of how the TIE 
may be configured using XML. For each strategy of the 
TTE, a comparison type is specified. For instance, where 



<STRATEGY> 

<MAP/> 
</STRATEGY> 



[0103] is shown, a Context Comparison is identified, refer- 
ring to the use of the User Defined Mapping Service 
(UDMS). Alternatively, where 



<STRATEGY> 

<ELEMENT compare - "exact" threshold «* "1.07> 
<DATATYPE compare = "lineage" threshold - "1.07> 
<A1 TRIBUTE value = "description" compare « "exact" 
threshold - "1.07> 

</STRATEGY> 



[0104] is shown, the TTE strategy includes an Element 
Comparison referred to "exact" as the Name Comparison 
Algorithm to be used and having a normalized threshold 
score of 1.0; a Datatype Comparison specifying a Lineage 
Comparison and having a normalized threshold score of 1.0; 
and an Attribute Value Comparison, specifying "descrip- 
tion" as the attribute, referring to "exact" as the comparison 
algorithm to be used, and having a normalized threshold 
score of 1.0. Other languages and representations than XML 
may be used, and TTE configurations may be defined and 
structured differently than the examples shown in FIG. 12, 
without departing from the scope of the current invention. 

[0105] A flow chart of the tree transformation can be 
found in FIGS. 13A-13B. The process starts in FIG. 13A, in 
accordance with step 1301, by setting the active source 
element and the active target element. The active source 
element is set to an element from a Source data hierarchy. 
The active target element is set to an element from a Target 



data hierarchy, to which the Source element is to be con- 
verted or paired. In accordance with step 1302, it is deter- 
mined whether the active source element has any children. 

[0106] If the source element has no children, then it is 
determined whether the active target element has any chil- 
dren, in accordance with step 1303, and the data contained 
in the active source element is applied to the target in one of 
two ways. If neither the active source element, nor the active 
target element, has children, then the data is applied directly 
to the target element, in accordance with step 1304. If source 
element has no children, but the target element does have 
children, then the data is "tokenized," or broken apart, and 
distributed among the child elements of the target using a 
Decomposition Algorithm, in accordance with step 1305. 
The Decomposition Algorithm 1305 may comprise any 
algorithm suitable for applying data tokens to child elements 
of a hierarchical data structure. 

[0107] If it is determined in step 1302 that the active 
source element does have children, then it is determined 
whether the active target element has any children, in 
accordance with step 1306. If the active source element has 
children but the active target element does not have children, 
then the data on the children of the active source element is 
concatenated into one value and applied to the active target 
element, in accordance with step 1307. If both the active 
source element and the active target element have children, 
then a series of strategies are evaluated on each of the source 
element children, attempting to find a pair matching for the 
children of the target. The active strategy is then set to the 
best strategy, in accordance with step 1308. The best strategy 
is the first strategy in the strategies of the TTE that have been 
defined and ordered by the user, as described previously. In 
one embodiment, the strategies are ordered according to 
accuracy, and the best strategy comprises the most accurate 
strategy. 

[0108] FIG. 13B illustrates the pairing process. The best 
strategy is evaluated on the first child of the active source 
element, to find a matching pair with a target element. In 
accordance with step 1309, an active source child pointer is 
set to the first child of the active source element. In accor- 
dance with step 1310, it is then determined if an unmarked 
child of the target element satisfies the active strategy. 

[0109] If an unmarked child of the target element does 
satisfy the active strategy, then the target child is marked, in 
accordance with step 1311. In accordance with step 1312, 
the active source element is then set to the child of the source 
element to which the pointer is set, and the active target 
element is set to the marked target child. The Tree Trans- 
formation Process is then reiterated, beginning with step 
1302. 

[0110] If no unmarked child of the target element satisfies 
the active strategy, then it is determined whether there is 
another child of the active source element, in accordance 
with step 1313. If so, then the active source child pointer is 
set to the next child of the source element, in accordance 
with step 1314, and the pairing process is reiterated, begin- 
ning with step 1310. If there are no other children of the 
source element, then it is determined whether there are other 
strategies available besides the active strategy, in accordance 
with step 1315. If so, then the active strategy is reset to the 
next best strategy, in accordance with step 1316. The active 
child pointer is then set to the first child of the source 
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element, for use with the new strategy, in accordance with 
step 1309, and the pairing process is reiterated, beginning at 
step 1310. If there are no other strategies available, then it 
is determined that no element pairing is available, in accor- 
dance with step 1317, and the pairing process and Tree 
Transformation Process are terminated. 

[0111] As shown, the pair matching function is an iterative 
process that continues, until either all the source children 
have been matched, or all the strategies have been 
exhausted. 

[0112] FIGS. 14A-14C illustrates how a strategy is evalu- 
ated, during the process shown by FIGS. 13A-13B. A 
strategy evaluation is initiated in FIG. 14 at step 1401, by 
passing the strategy engine a pair of elements: one from the 
Source data hierarchy, and one from the Target data hierar- 
chy. As described previously, a strategy is a series of 
comparisons. Each comparison in the series is evaluated in 
order. An active comparison is set to the first comparison in 
the series, in accordance with step 1402. The pair of ele- 
ments is assumed to satisfy the comparison until a compara- 
tor deems then as unsuccessful. Thus, in accordance with 
step 1403, the Success value is initially set at "true/' A 
determination is then made as to the type of the active 
comparison. In accordance with step 1404, it is determined 
whether the active comparison is a Context Comparison. If 
so, then in accordance with step 1405, it is determined 
whether a mapping exists for the two elements in the 
UDMS, as described with reference to FIG. 10. If a mapping 
exists, then the Success value remains at "true," in accor- 
dance with step 1406. In accordance with step 1407, it is 
then determined whether there are more comparisons in the 
series of the strategy. If there are more comparisons in the 
series, then the active comparison is set to the next com- 
parison in the series, in accordance with step 1408, and the 
strategy evaluation process is reiterated, beginning with step 
1403. If no more comparisons exist, then the Success value 
of "true" is returned, in accordance with step 1410. 

[0113] If it is determined in step 1405 that a mapping does 
not exist for the elements, then the Success value for the 
Context Comparison is set at "false," in accordance with 
step 1409. The Success value of "false" is then returned, in 
accordance with step 1410. 

[0114] If it is determined in step 1404 that the active 
comparison is not a Context Comparison, then it is deter- 
mined whether the active comparison is an Element Com- 
parison, in accordance with step 1411. If so, then the Name 
Comparison Algorithm that is referenced by the Element 
Comparison, as described at FIG. 10, is loaded from the 
SSS, in accordance with step 1412. The element name 
values from the Source and Target elements are then passed 
into the Name Comparison Algorithm, in accordance with 
step 1413. 

[0115] In accordance with step 1417, it is then determined 
whether the score returned by the Name Comparison Algo- 
rithm equals or exceeds the threshold defined by the Element 
Comparison. If so, then the Success value remains at "true," 
in accordance with step 1418. In accordance with step 1419, 
it is then determined whether there are more comparisons in 
the series of the strategy. If there are more comparisons in 
the series, then the active comparison is set to the next 
comparison in the series, in accordance with step 1420, and 
the strategy evaluation process is reiterated, beginning with 



step 1403. If no more comparisons exist, then the Success 
value of "true" is returned, in accordance with step 1422. If 
it is determined in step 1417 that the score returned by the 
Name Comparison Algorithm is lower than the threshold 
score, then the Success value is set to "false," in accordance 
with step 1421, and is returned in accordance with step 1422. 

[0116] If it is determined in step 1411 that the active 
comparison is not an Element Comparison, then it is deter- 
mined whether the active comparison is an Attribute Com- 
parison, in accordance with step 1414. If so, then the Name 
Comparison Algorithm that is referenced by the Attribute 
Comparison, as described at FIG. 10, is loaded from the 
SSS, in accordance with step 1415. The attribute values from 
the Source and Target elements are then passed into the 
Name Comparison Algorithm, in accordance with step 1416. 

[0117] In accordance with step 1417, it is then determined 
whether the score returned by the Name Comparison Algo- 
rithm equals or exceeds the threshold defined by the 
Attribute Comparison. If so, then the Success value remains 
at "true," in accordance with step 1418. In accordance with 
step 1419, it is then determined whether there are more 
comparisons in the series of the strategy. If there are more 
comparisons in the series, then the active comparison is set 
to the next comparison in the series, in accordance with step 
1420, and the strategy evaluation process is reiterated, 
beginning with step 1403. If no more comparisons exist, 
then the Success value of "true" is returned, in accordance 
with step 1422. If it is determined in step 1417 that the score 
returned by the Name Comparison Algorithm is lower than 
the threshold score, then the Success value is set to "false," 
in accordance with step 1421, and is returned in accordance 
with step 1422. 

[0118] If it is determined in step 1414 that the active 
comparison is not an Attribute Comparison, then it is deter- 
mined in step 1423 whether the active comparison is a 
Datatype Comparison. If so, then it is determined in step 
1424 whether the Datatype Comparison is a Lineage Com- 
parison. If so, then the Lineage Comparison Algorithm that 
is referenced by the Lineage Comparison, as described at 
FIG. 10, is loaded from the SSS, in accordance with step 
1425. The datatype names from the Source and Target 
elements are then passed into the Lineage Comparison 
Algorithm, in accordance with step 1426. 

[0119] In accordance with step 1430, it is then determined 
whether the score returned by the Lineage Comparison 
Algorithm equals or exceeds the threshold defined by the 
Lineage Comparison. If so, then the Success value remains 
at "true," in accordance with step 1431. In accordance with 
step 1432, it is- then determined whether there are more 
comparisons in the series of the strategy. If there are more 
comparisons in the series, then the active comparison is set 
to the next comparison in the series, in accordance with step 
1433, and the strategy evaluation process is reiterated, 
beginning with step 1403. If no more comparisons exist, 
then the Success value of "true" is returned, in accordance 
with step 1435. If it is determined in step 1430 that the score 
returned by the Lineage Comparison Algorithm is lower 
than the threshold score, then the Success value is set to 
"false," in accordance with step 1434, and is returned in 
accordance with step 1435. 

[0120] If it is determined in step 1424 that the Datatype 
Comparison is not a Lineage Comparison, then it is deter- 
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mined in step 1427 whether the Datatype Comparison is a 
Tree (or Child Structure) Comparison. If so, then the Tree 
Comparison Algorithm that is referenced by the Tree Com- 
parison, as described at FIG. 10, is loaded from the SSS, in 
accordance with step 1428. The tree hierarchies from the 
Source and Target elements are then passed into the Tree 
Comparison Algorithm, in accordance with step 1429. 

[0121] In accordance with step 1430, it is then determined 
whether the score returned by the Tree Comparison Algo- 
rithm equals or exceeds the threshold defined by the Tree 
Comparison. If so, then the Success value remains at "true/' 
in accordance with step 1431. In accordance with step 1432, 
it is then determined whether there are more comparisons in 
the series of the strategy. If there are more comparisons in 
the series, then the active comparison is set to the next 
comparison in the series, in accordance with step 1433, and 
the strategy evaluation process is reiterated, beginning with 
step 1403. If no more comparisons exist, then the Success 
value of "true" is returned, in accordance with step 1435. If 
it is determined in step 1430 that the score returned by the 
Name Comparison Algorithm is lower than the threshold 
score, then the Success value is set to "false," in accordance 
with step 1434, and is returned in accordance with step 1435. 

[0122] If it is determined that the Datatype Comparison is 
neither a Lineage nor Tree Comparison, or if it is determined 
that the active comparison is not a Datatype Comparison at 
all, then it is determined that the active comparison is 
undefined, in accordance with step 1436, and the strategy 
evaluation process is terminated. 

[0123] FIGS, 14A-14C illustrate one embodiment for 
evaluating strategies, in accordance with the present inven- 
tion. It will be recognized by those skilled in the art that 
changes may be made to the steps shown in the figures, 
without departing from the scope of the invention. Examples 
of such changes include, but are not limited to, a change in 
the order in which the comparison type of the active com- 
parison is determined, and setting the active comparison to 
the next comparison in the series of a strategy, when it is 
determined that the active comparison in an undefined 
comparison. 

[0124] Using the foregoing, the invention may be imple- 
mented using standard programming or engineering tech- 
niques including computer programming software, firm- 
ware, hardware or any combination or subset thereof. Any 
such resulting program, having a computer readable pro- 
gram code means, may be embodied or provided within one 
or more computer readable or usable media, thereby making 
a computer program product, i. e. an article of manufacture, 
according to the invention. The computer readable media 
may be, for instance a fixed (hard) drive, disk, diskette, 
optical disk, magnetic tape, semiconductor memory such as 
read-only memory (ROM), or any transmitting/receiving 
medium such as the Internet or other communication net- 
work or link. The article of manufacture containing the 
computer programming code may be made and/or used by 
executing the code directly from one medium, by copying 
the code from one medium to another medium, or by 
transmitting the code over a network. 

[0125] An apparatus for making, using or selling the 
invention may be one or more processing systems including, 
but not limited to, a central processing unit (CPU), memory, 
storage devices, communication links, communication 



devices, server, I/O devices, or any sub-components or 
individual parts of one or more processing systems, includ- 
ing software, firmware, hardware or any combination or 
subset thereof, which embody the invention as set forth in 
the claims. 

[0126] User input may be received from the keyboard, 
mouse, pen, voice, touch screen, or any other means by 
which a human can input data to a computer, including 
through other programs such as application programs. 

[0127] Although the present invention has been described 
in detail with reference to certain preferred embodiments, it 
should be apparent that modifications and adaptations to 
those embodiments may occur to persons skilled in the art 
without departing from the spirit and scope of the present 
invention. 

1. A method for sharing data between hierarchical data- 
bases, comprising: 

defining, configuring and storing datatypes; 

defining, configuring and storing hierarchical data struc- 
tures comprising the datatypes; 

establishing and storing a lineage for linking related 
datatypes into families; 

defining, configuring and storing measures of similarity 
and similarity match tolerances; 

defining, configuring and storing match strategies; 

transforming a source hierarchical data structure to a 
target hierarchical data structure by determining the 
similarity between the source and target data structure; 
and 

evaluating an effectiveness indicia of match strategies. 

2. The method of claim 1, further comprising manually 
defining, configuring and storing mappings between 
datatype elements. 

3. The method of claim 1, wherein the step of defining, 
configuring and storing datatypes comprises reading, writ- 
ing, and deleting a datatype name, a parent datatype refer- 
ence, and an element, the element comprising an element 
name, a datatype reference, a. positional reference, an ele- 
ment reference, and an alias name. 

4. The method of claim 1, wherein the step of defining, 
configuring and storing hierarchical data structures com- 
prises specifying a parent datatype reference and an element 
of a datatype having a datatype reference, an element 
reference and an alias name. 

5. The method of claim 1 r wherein the step of definin g. 
configuring and storing hierarchical data structures com - 
p oses nes ting datatypes into groups o f higher le v el schem a 
datatypes?" " ~ 

67 The method of claim 1, wherein the step of establishing 
and storing a lineage for linking related datatypes into 
families comprises locating a common datatype family and 
locating a common datatype ancestor between a datatype 1 
and a datatype 2. 

7. The method of claim 6, wherein locating a common 
datatype family between the datatype 1 and the datatype 2 
comprises: 

walking up a family tree of the datatype 1 to a root; 
walking up a family tree of the datatype 2 to a root; and 



09/17/2003, EAST Version: 1.04.0000 



US 2002/0023097 Al 



11 



Feb. 21, 2002 



determining if the root of datatype 1 is the same as the root 
of datatype 2. 

8. The method of claim 1, wherein the step of defining, 
configuring and storing measures of similarity and similarity 
match tolerances comprises specifying a comparison algo- 
rithm by identifying an algorithm name, an implementation, 
and implementation parameters. 

9. The method of claim 1, wherein the step of defining, 
configuring and storing match strategies comprises specify- 
ing comparisons by context, element, data type, and attribute 
for each of the strategies, and ordering the strategies accord- 
ing to accuracy. 

10. The method of claim 1, wherein the step of trans- 
forming a source hierarchical data structure to a target 
hierarchical data structure comprises: 

receiving a source data element from the source hierar- 
chical data structure and a target data element from the 
target hierarchical data structure; 

determining whether the source data element has at least 
one source child data element and the target data 
element has at least one target child data element; 

copying the source data element to the target data element 
if the source data element has no source child data 
elements and the target data element has no target child 
data elements; 

separating the source data element and applying the 
separated source data element to at least one target 
child data element if the source data element has no 
source child data elements and the target data element 
has at least one target child data element; 

concatenating the at least one source child data element 
into one value and applying the one value to the target 
data element if the source data element has at least one 
source child data element and the target data element 
has no target child target data elements; 

applying a source child data element to a target child data 
element when the source child data element matches 
the target child data element if a source data element 
has at least one source child data element and a target 
data element has at least one target child data element; 
and 

repeating the previous steps until all target data elements 
have been examined for each of a group of selected 
strategies. 

11. The method of claim 10, wherein the step of separat- 
ing the source data element further comprises separating the 
source data elements into tokens and applying the tokens to 
at least one target child data element. 

12. The method of claim 10, wherein the step of separat- 
ing the source data element further comprises using a best-fit 
algorithm to separate and apply the data. 

13. The method of claim 2, wherein the step of defining, 
configuring and storing mappings comprises: 

inputting source and target datatypes and retrieving an 
associated mapping; 

inputting source and target datatypes and removing an 
associated mapping; 

inputting a mapping specification for storing; and 



inputting source data schema, target data schema, source 
data, and target data, and retrieve an associated map- 
ping. 

14. The method of claim 1, wherein the step of evaluating 
an effectiveness indicia of match strategies comprises: 

determining a success value of a context comparison 
between source and target datatypes based on a map- 
ping between source and target schema; 

determining a success value of an element comparison 
between source and target datatypes based on a name 
comparison of source and target data elements; 

determining a success value of an attribute comparison 
between source and target datatypes based on a name 
comparison of source and target data attributes; 

determining a success value of a datatype comparison 
between source and target datatypes based on a lineage 
comparison of source and target datatypes; 

determining a success value of a tree structure comparison 
between source and target datatype tree structures; and 

aggregating the success values obtained from the com- 
parisons resulting from at least one match strategy to 
determine an effectiveness indicia for the at least one 
match strategy. 

15. A computer program embodied on a computer-read- 
able medium incorporating the method of claim 1 . 

16. A system for sharing data between hierarchical data- 
bases, comprising: 

a datatype services facility for defining, configuring and 
storing datatypes and hierarchical data structures, and 
establishing and storing lineage for linking related 
datatypes into families; 

a user-defined mapping services facility for defining, 
configuring and storing mappings between data ele- 
ments; 

a similarity scoring services facility for defining, config- 
uring and storing measures of similarity and similarity 
match tolerances; and 

a tree transformation engine for defining, configuring and 
storing match strategies, transforming a source hierar- 
chical data structure to a target hierarchical data struc- 
ture by determining the similarity between the source 
and target data structure, and evaluating an effective- 
ness indicia of match strategies. 

17. The system of claim 16, further comprising at least 
one match strategy from the tree transformation engine that 
is stored in the similarity scoring services facility. 

18. The system of claim 16, further comprising a least one 
match strategy from the similarity services scoring facility 
that is provided to the tree transformation engine. 

19. The system of claim 16, wherein each of the match 
strategies comprise at least one comparison utility selected 
from the group consisting of a context comparison utility, an 
element comparison utility, an attribute comparison utility, a 
datatype lineage comparison utility, and a datatype tree 
structure comparison utility. 

20. The system of claim 16, wherein the match strategies 
are stored in the similarity scoring services facility in 
descending order by the effectiveness indicia of each match 
strategy. 
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21. The system of claim 20, wherein the effectiveness 
indicia is a match strategy accuracy. 

22. The system of claim 16, wherein a user may explicitly 
define a match between datatype elements using the user- 
defined mapping services facility. 

23. A method for applying data from a first hierarchical 
data structure to a second hierarchical data structure, com- 
prising: 

receiving at least one source element from the first 
hierarchical data structure and at least one target ele- 
ment from the second hierarchical data structure 

determining whether source elements and target elements 
have child elements 

copying data from a source element to a target element; 

separating data from a source element and applying the 
data to at least one child of a target element; 

comparing a child of a source element to a child of a target 
element and determining a match; and 

copying data from a source child element to a target child 
element where a match is determined. 

24. The method of claim 23, further comprising receiving 
a definition and configuration of a datatype from a user. 

25. The method of claim 23, further comprising receiving 
a definition and configuration of a source and target datatype 
mapping from a user. 

26. The method of claim 23, further comprising receiving 
a definition and configuration of a match strategy from a 
user. 

27. A computer program embodied on a computer-read- 
able medium incorporating the method of claim 23. 

28. A system for applying data from a first hierarchical 
data structure to a second hierarchical data structure, com- 
prising: 

a means for receiving at least one source element from the 
first hierarchical data structure and at least one target 
element .from the second hierarchical data structure; 

a means for determining whether source elements and 
target elements have child elements 

a means for copying data from a source element to a target 
element; 

a means for separating data from a source element and 
applying the data to at least one child of a target 
element; 

a means for comparing a child of a source element to a 
child of a target element and determining a match; and 



a means for copying data from a source child element to 
a target child element, where a match is determined. 

29. A computer-readable medium containing a data struc- 
ture for sharing data between hierarchical databases, com- 
prising: 

a source hierarchical data structure comprising source 
datatypes; 

a source lineage for linking related source datatypes into . 
families; 

a target hierarchical data structure comprising target 
datatypes; 

a target lineage for linking related target datatypes into 
families; 

measures of similarity and similarity match tolerances; 
match strategies; 

results of a similarity transformation and an effectiveness 
indicia of match strategies. 

30. The computer- readable medium of claim 29, further 
comprising mappings between source datatype elements and 
target datatype elements. 

31. The computer-readable medium of claim 29, wherein 
the source and target datatypes each comprise a datatype 
name, a parent datatype reference, and an element. 

32. The computer-readable medium of claim 31, wherein 
the element comprises an element name, a datatype refer- 
ence, a positional reference, an element reference, and an 
alias name. 

33. The computer-readable medium of claim 29, wherein 
the source and target hierarchical data structures each com- 
prise a parent datatype reference and an element of a 
datatype having a datatype reference, an element reference, 
and an alias name. 

34. The computer-readable medium of claim 29, wherein 
the measures of similarity and similarity match tolerances 
comprise a comparison algorithm that identifies an algo- 
rithm name, an implementation, and implementation param- 
eters. 

35. The computer-readable medium of claim 29, wherein 
the match strategies comprise a comparison by context, 
comparison by element, comparison by data type, and 
comparison by attribute for each of the strategies, and an 
ordering of strategies according to accuracy. 

36. The computer-readable medium of claim 30, wherein 
the mappings comprise mappings associated with source and 
target datatypes, mapping specifications, source data 
schema, target data schema, source data, and target data. 

♦ * * * * 
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